AITopics | evaluation value

Collaborating Authors

evaluation value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Surrogate Benchmarks for Model Merging Optimization

Akizuki, Rio, Kudo, Yuya, Yoshinari, Nozomu, Hirose, Yoichi, Nishimoto, Toshiyuki, Uchida, Kento, Shirakawa, Shinichi

arXiv.org Artificial IntelligenceSep-3-2025

Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

artificial intelligence, benchmark, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.02555

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression

Yoshida, Kai, Mizukami, Masahiro, Kawano, Seiya, Kruengkrai, Canasai, Sugiyama, Hiroaki, Yoshino, Koichiro

arXiv.org Artificial IntelligenceJan-25-2025

To improve user engagement during conversations with dialogue systems, we must improve individual dialogue responses and dialogue impressions such as consistency, personality, and empathy throughout the entire dialogue. While such dialogue systems have been developing rapidly with the help of large language models (LLMs), reinforcement learning from AI feedback (RLAIF) has attracted attention to align LLM-based dialogue models for such dialogue impressions. In RLAIF, a reward model based on another LLM is used to create a training signal for an LLM-based dialogue model using zero-shot/few-shot prompting techniques. However, evaluating an entire dialogue only by prompting LLMs is challenging. In this study, the supervised fine-tuning (SFT) of LLMs prepared reward models corresponding to 12 metrics related to the impression of the entire dialogue for evaluating dialogue responses. We tuned our dialogue models using the reward model signals as feedback to improve the impression of the system. The results of automatic and human evaluations showed that tuning the dialogue model using our reward model corresponding to dialogue impression improved the evaluation of individual metrics and the naturalness of the dialogue response.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.12698

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Study of the Proper NNUE Dataset

Tan, Daniel, Medina, Neftali Watkinson

arXiv.org Artificial IntelligenceDec-23-2024

NNUE (Efficiently Updatable Neural Networks) has revolutionized chess engine development, with nearly all top engines adopting NNUE models to maintain competitive performance. A key challenge in NNUE training is the creation of high-quality datasets, particularly in complex domains like chess, where tactical and strategic evaluations are essential. However, methods for constructing effective datasets remain poorly understood and under-documented. In this paper, we propose an algorithm for generating and filtering datasets composed of "quiet" positions--positions that are stable and free from tactical volatility. Our approach provides a clear methodology for dataset creation, which can be replicated and generalized across various evaluation functions. Testing demonstrates significant improvements in engine performance, confirming the effectiveness of our method.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.17948

Country:

North America > United States > California > Riverside County > Riverside (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Games > Chess (0.69)

Add feedback

A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task

Fujisaki, Yuya, Takagi, Shiro, Asoh, Hideki, Kumagai, Wataru

arXiv.org Artificial IntelligenceSep-10-2024

The progress in text summarization techniques has been remarkable. However the task of accurately extracting and summarizing necessary information from highly specialized documents such as research papers has not been sufficiently investigated. We are focusing on the task of extracting research questions (RQ) from research papers and construct a new dataset consisting of machine learning papers, RQ extracted from these papers by GPT-4, and human evaluations of the extracted RQ from multiple perspectives. Using this dataset, we systematically compared recently proposed LLM-based evaluation functions for summarizations, and found that none of the functions showed sufficiently high correlations with human evaluations. We expect our dataset provides a foundation for further research on developing better evaluation functions tailored to the RQ extraction task, and contribute to enhance the performance of the task. The dataset is available at https://github.com/auto-res/PaperRQ-HumanAnno-Dataset.

dataset, evaluation, evaluation function, (13 more...)

arXiv.org Artificial Intelligence

2409.06883

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Automatic Grouping of Redundant Sensors and Actuators Using Functional and Spatial Connections: Application to Muscle Grouping for Musculoskeletal Humanoids

Kawaharazuka, Kento, Nishiura, Manabu, Koga, Yuya, Omura, Yusuke, Toshimitsu, Yasunori, Asano, Yuki, Okada, Kei, Kawasaki, Koji, Inaba, Masayuki

arXiv.org Artificial IntelligenceSep-1-2024

For a robot with redundant sensors and actuators distributed throughout its body, it is difficult to construct a controller or a neural network using all of them due to computational cost and complexity. Therefore, it is effective to extract functionally related sensors and actuators, group them, and construct a controller or a network for each of these groups. In this study, the functional and spatial connections among sensors and actuators are embedded into a graph structure and a method for automatic grouping is developed. Taking a musculoskeletal humanoid with a large number of redundant muscles as an example, this method automatically divides all the muscles into regions such as the forearm, upper arm, scapula, neck, etc., which has been done by humans based on a geometric model. The functional relationship among the muscles and the spatial relationship of the neural connections are calculated without a geometric model.

functional connection, sensor and actuator, spatial connection, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2021.3060715

2409.00678

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.86)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

Function Smoothing Regularization for Precision Factorization Machine Annealing in Continuous Variable Optimization Problems

Endo, Katsuhiro, Takahashi, Kazuaki Z.

arXiv.org Artificial IntelligenceJul-5-2024

Solving continuous variable optimization problems by factorization machine quantum annealing (FMQA) demonstrates the potential of Ising machines to be extended as a solver for integer and real optimization problems. However, the details of the Hamiltonian function surface obtained by factorization machine (FM) have been overlooked. This study shows that in the widely common case where real numbers are represented by a combination of binary variables, the function surface of the Hamiltonian obtained by FM can be very noisy. This noise interferes with the inherent capabilities of quantum annealing and is likely to be a substantial cause of problems previously considered unsolvable due to the limitations of FMQA performance. The origin of the noise is identified and a simple, general method is proposed to prevent its occurrence. The generalization performance of the proposed method and its ability to solve practical problems is demonstrated.

evaluation point, function surface, hamiltonian, (17 more...)

arXiv.org Artificial Intelligence

2407.04393

Country: Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Many-Objective-Optimized Semi-Automated Robotic Disassembly Sequences

Kiyokawa, Takuya, Harada, Kensuke, Wan, Weiwei, Ishikura, Tomoki, Miyaji, Naoya, Matsuda, Genichiro

arXiv.org Artificial IntelligenceJan-3-2024

This study tasckles the problem of many-objective sequence optimization for semi-automated robotic disassembly operations. To this end, we employ a many-objective genetic algorithm (MaOGA) algorithm inspired by the Non-dominated Sorting Genetic Algorithm (NSGA)-III, along with robotic-disassembly-oriented constraints and objective functions derived from geometrical and robot simulations using 3-dimensional (3D) geometrical information stored in a 3D Computer-Aided Design (CAD) model of the target product. The MaOGA begins by generating a set of initial chromosomes based on a contact and connection graph (CCG), rather than random chromosomes, to avoid falling into a local minimum and yield repeatable convergence. The optimization imposes constraints on feasibility and stability as well as objective functions regarding difficulty, efficiency, prioritization, and allocability to generate a sequence that satisfies many preferred conditions under mandatory requirements for semi-automated robotic disassembly. The NSGA-III-inspired MaOGA also utilizes non-dominated sorting and niching with reference lines to further encourage steady and stable exploration and uniformly lower the overall evaluation values. Our sequence generation experiments for a complex product (36 parts) demonstrated that the proposed method can consistently produce feasible and stable sequences with a 100% success rate, bringing the multiple preferred conditions closer to the optimal solution required for semi-automated robotic disassembly operations.

disassembly, evaluation value, sequence, (16 more...)

arXiv.org Artificial Intelligence

2401.01817

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.86)

Add feedback

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki

arXiv.org Artificial IntelligenceMar-9-2023

State recognition of objects and environment in robots has been conducted in various ways. In most cases, this is executed by processing point clouds, learning images with annotations, and using specialized sensors. In contrast, in this study, we propose a state recognition method that applies Visual Question Answering (VQA) in a Pre-Trained Vision-Language Model (PTVLM) trained from a large-scale dataset. By using VQA, it is possible to intuitively describe robotic state recognition in the spoken language. On the other hand, there are various possible ways to ask about the same event, and the performance of state recognition differs depending on the question. Therefore, in order to improve the performance of state recognition using VQA, we search for an appropriate combination of questions using a genetic algorithm. We show that our system can recognize not only the open/closed of a refrigerator door and the on/off of a display, but also the open/closed of a transparent door and the state of water, which have been difficult to recognize.

accuracy, image look, recognition, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA48891.2023.10160390

2303.05052

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Information Compression and Performance Evaluation of Tic-Tac-Toe's Evaluation Function Using Singular Value Decomposition

Fujita, Naoya, Watanabe, Hiroshi

arXiv.org Artificial IntelligenceDec-2-2022

We approximated the evaluation function for the game Tic-Tac-Toe by singular value decomposition (SVD) and investigated the effect of approximation accuracy on winning rate. We first prepared the perfect evaluation function of Tic-Tac-Toe and performed low-rank approximation by considering the evaluation function as a ninth-order tensor. We found that we can reduce the amount of information of the evaluation function by 70% without significantly degrading the performance. Approximation accuracy and winning rate were strongly correlated but not perfectly proportional. We also investigated how the decomposition method of the evaluation function affects the performance. We considered two decomposition methods: simple SVD regarding the evaluation function as a matrix and the Tucker decomposition by higher-order SVD (HOSVD). At the same compression ratio, the strategy with the approximated evaluation function obtained by HOSVD exhibited a significantly higher winning rate than that obtained by SVD. These results suggest that SVD can effectively compress board game strategies and an optimal compression method that depends on the game exists.

artificial intelligence, evaluation function, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.7566/JPSJ.92.034802

2207.02449

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Tic-Tac-Toe (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Tree Index: A New Cluster Evaluation Technique

Beg, A. H., Islam, Md Zahidul, Estivill-Castro, Vladimir

arXiv.org Machine LearningMar-24-2020

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.

clustering result, evaluation value, tree index, (15 more...)

arXiv.org Machine Learning

2003.10841

Country:

Oceania > Australia (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback